SCARF: A Segmental CRF Speech Recognition System
نویسندگان
چکیده
We propose a theoretical framework for doing speech recognition with segmental conditional random fields, and describe the impleme-nation of a toolkit for experimenting with these models. This framework allows users to easily incorporate multiple detector streams into a discriminatively trained direct model for large vocabulary continuous speech recognition. The detector streams can operate at multiple scales (frame, phone, multi-phone, syllable or word) and are combined at the word level in the CRF training and decoding processes. A key aspect of our approach is that features are defined at the word level, and can thus identify long span phenomena such as the edit distance between an observed and expected sequence of detection events. Further, a wide variety of features are automatically constructed from atomic detector streams, allowing the user to focus on the creation of informative detectors. Generalization to unseen words is possible through the use of decomposable consistency features [1, 2], and our framework allows for the joint or separate training of the acoustic and language models.
منابع مشابه
SCARF: a segmental conditional random field toolkit for speech recognition
This paper describes a new toolkit SCARF for doing speech recognition with segmental conditional random fields. It is designed to allow for the integration of numerous, possibly redundant segment level acoustic features, along with a complete language model, in a coherent speech recognition framework. SCARF performs a segmental analysis, where each segment corresponds to a word, thus allowing f...
متن کاملSegmental Recurrent Neural Networks for End-to-End Speech Recognition
We study the segmental recurrent neural network for end-to-end acoustic modelling. This model connects the segmental conditional random field (CRF) with a recurrent neural network (RNN) used for feature extraction. Compared to most previous CRF-based acoustic models, it does not rely on an external system to provide features or segmentation boundaries. Instead, this model marginalises out all t...
متن کاملSegmental conditional random fields with deep neural networks as acoustic models for first-pass word recognition
Discriminative segmental models, such as segmental conditional random fields (SCRFs), have been successfully applied to speech recognition recently in lattice rescoring to integrate detectors across different levels of units, such as phones and words. However, the lattice generation has been constrained by a baseline decoder, typically a frame-based hybrid HMMDNN system, which still suffers fro...
متن کاملApplication of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretr...
متن کاملMultitask Learning with CTC and Segmental CRF for Speech Recognition
Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009